Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

e m is a posterior mean vector and is defined as below,

ܕൌߚሺߚ܆܆^௧൅ߙ۷ሻ^ିଵ܆ܡ

(3.26)

uation and generalisation of a supervised machine learning

nant analysis or classification analysis belong to the supervised

learning category. A critical issue of a supervised machine

model is whether it has been properly evaluated before being

for real use, i.e., for the inference on new data. There are mainly

oaches for evaluating a discriminant analysis model. They are the

n matrix for the fix-point evaluation and the receiver operating

istic analysis for the robustness evaluation. The other issue in

hip with a supervised machine learning model is whether the

nce of the model has been well tested using a novel data set. This

s called the generalisation test of a supervised machine learning

his section will address these two practical issues.

nfusion matrix

ion matrix is a fix-point evaluation approach [Stehman, 1997].

uts of most discriminant models are continuous. Therefore, a

is required to convert the continuous model output variable (ݕො)

ry prediction class variable. Suppose a threshold (ߜ) has been

ݕො൏ߜ will lead to one class label such as zero and ݕො൒ߜ will

her class label such as one.

a discriminant model has been constructed and predictions have

de, an output (prediction) table will be received as shown in Table

pose a threshold has been decided. The prediction variable ݕො is

d to the prediction label variable Z. It is also supposed that the

omposed of two classes of data points, which are labelled by A

ble 3.2 shows such a prediction table, in which two columns were